o 

(N 



oo 






> 

On 

m 
rn 

rn 

O 



X 
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We consider 1-qubit mixed quantum state estimation by adaptively updating measurements ac- 
cording to previously obtained outcomes and measurement settings. Updates are determined by the 
average-variance-optimality (A-optimality) criterion, known in the classical theory of experimental 
design and applied here to quantum state estimation. In general, A-optimization is a nonhnear 
minimization problem; however, we find an analytic solution for 1-qubit state estimation using pro- 
jective measurements, reducing computational effort. We compare numerically two adaptive and 
two nonadaptive schemes for finite data sets and show that the A-optimahty criterion gives more 
precise estimates than standard quantum tomography. 
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I. INTRODUCTION 

For successful experimental implementation of any 
quantum protocol, the quantum states and operations 
involved must be confirmed to be sufficiently closed to 
tlieir theoretical targets. One way to obtain such a con- 
firmation is to perform another experiment and from the 
obtained data make an estimate of the quantum oper- 
ator involved. Statistically, this is a constrained multi- 
parameter estimation problem - the quantum estimation 
problem ~ where we assume we are given a finite number 
of identical copies of a quantum state or operation, we 
perform measurements whose mathematical description 
is assumed to be known, and from the outcome statistics 
we make our estimate. Due to the probabilistic behav- 
ior of the measurement outcomes and the finiteness of the 
number of measurement trials, there always exist statisti- 
cal errors in any quantum estimate. The size of the error 
depends on the choice of measurements and the estima- 
tion procedure. In statistics, the former is called an ex- 
perimental design, while the latter is called an estimator. 
It is, therefore, a key aim of both classical and quantum 
estimation theory to find a combination of experimental 
design and estimator which gives us more precise estima- 
tion results using fewer measurement trials. 

A standard combination in quantum information ex- 
periments is that of quantum tomography and maxi- 
mum likelihood estimator. Although the term "quan- 
tum tomography" can be used in several different con- 
texts, we use it to mean an experimental design in which 
an independently and identically prepared set of mea- 
surements are used throughout the entire experiment [l| . 
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The performance of different choices for the set of to- 
mographic measurements have been studied, in, for ex- 
ample, [2, y|- This of course raises the question of the 
performance of adaptive experimental designs, in which 
the measurements performed from trial to trial are not 
independent, and are chosen according to previous mea- 
surement settings and the outcomes obtained. Clearly, 
adaptive experimental designs are a superset of the non- 
adaptive ones, and as such can potentially achieve higher 
performance. 

Adaptive designs are characterized by the way in which 
measurements are related from trial to trial, referred to 
as an update criterion. Previously proposed update cri- 
teria include those based on asymptotic statistical esti- 
mation theory (Fisher information) [^-|6|, direct calcula- 
tions of the estimates expected to be obtained in the next 
measurement [3, Q , mutually unbiased basis M , as well 
as Bayesian estimators and Shannon entropy [7|, llO . Illj . 
Theoretical investigations report that some of the pro- 
posed update criteria give more precise estimates than 
nonadaptive quantum tomography, and an experimental 
implementation of the update criterion proposed in [7| 
in an ion trap system has been performed [IJ]- If N 
denotes the number of measurement trials and N is suffi- 
ciently large, it is known in 1-qubit state estimation that 
the expectation value of infidelity averaged over states, a 
measure of the estimation error, can decrease at best as 
0(iV~'^''*) in a nonadaptive experiment 13|, compared 
to 0{N^^) in adaptive experiments [T^. Most of the 
proposed update criteria, however, have high computa- 
tional cost that makes real experiments infeasible. In 
this paper, we propose an adaptive experimental design 
whose average expected infidelity decreases as 0{N~^) 
and whose update criterion, known as average-variance 
optimality (A-optimality) in classical statistics, has low 
computational cost for 1-qubit state estimation. 

The paper is structured as follows. In Sec. |ll] we 
lay out the notation and terminology that will be used 



throughout in this paper, by explaining basic concepts in 
adaptive experimental design, statistical parameter esti- 
mation, and A-optimality criteria. We also give a brief 
review of some of the proposed update criteria in the 
literature. In Sec. IIIII we give the explicit form of the 
analytic solution of the A-optimal update criterion, (the 
derivation is given in the Appendix). This analytic solu- 
tion makes it possible to reduce the computational cost 
for updating measurements, and using this we compare 
several estimation schemes numerically, showing that our 
proposal is more precise than standard quantum tomog- 
raphy. In Sec. IIVI we discuss the feasibility of imple- 
menting the proposed scheme experimentally. A sum- 
mary appears in Sec. |Vl 



II. PRELIMINARIES 

A. Notation and terminology 

We will adopt terms from the statistical literature, 
since they afford us the precision we need to properly 
discuss details of estimation schemes that can sometimes 
be subtle. In this subsection we will introduce a formal- 
ism for quantum estimation using that terminology, and 
apply it in a survey of several existing update criteria in 
Sec. im 



1. Model selection 

In statistical estimation theory, a statistical model is 
defined as a set of probability distributions, and we as- 
sume that the true probability distribution of interest is 
included in the set. In the quantum case, a probability 
distribution is determined by the state of the system and 
the action of the measurement on the state system. Let 
H be a Hilbert space with finite dimension d and S{'H) 
be the set of all density matrices acting on that Hilbert 
space. Suppose we know that the object we are trying 
to estimate lies in a subset O C S{H), that is, the true 
density matrix p is included in O. For example, when we 
know that the true state is pure, O is the set of all pure 
states. In this paper, we consider mixed state estimation, 
and we assume that in our finite TV measurement trials 
we prepare identical copies of an unknown state p G C 



given by Bern's rule p{x;Il\p) = Tr[Ilxp], where Tr de- 
notes the trace operation with respect to H, (note that 
in the next subsection, a different trace operation repre- 
sented as tr, is introduced). 

We consider sequential measurements, as opposed to 
collective measurements, on copies of p. We will index 
measurement trials using subscripts n G {1,2, . . . ,7V}, 
and sequences using superscripts. Thus, for some sym- 
bol A, An is its value taken at the n-th trial, while A'^ 
is the sequence {Ai, A2, . . . , A„}. We will also try to 
use calligraphic fonts for supersets. Adaptivity in our 
sense means that the POVM performed at (n + l)-th 
trial can depend on all the previous n trials' outcomes 
and POVMs. 

The measurement class Ain is the set of POVMs which 
are available at the n-th trial. We choose the n-th 
POVM, n„ = {n„,^}a;gAr„ from Mn, where Xn de- 
notes the set of measurement outcomes for the n-th trial. 
When it is independent of the trial, as is usually the 
case, we omit the index, using M for the measurement 
class and X for the outcome set. Let x" = {xi, . . . ,Xn} 
denote the sequence of outcomes obtained up to the 
n-th trial, where Xi G Xi. We will denote the pair 
of measurement performed and outcome obtained by 
Dn — (n„,a;„) e T>n '■— M-n X Xn, and refer to it as 
the data for trial n. The sequence of data up to trial 
n is thus D" = {Di,...,D„} e P" := xf^^Pi. After 
the n-th measurement, we choose the next, (n -I- l)-th, 
POVM n„+i = {Un+i,x}xGX„+i according to the pre- 
viously obtained data. Let Un denote the map from the 
data to the next measurement, that is, u„ : 2?"^^ -^ Mn, 
n„ = Un{D"'~^). We call u„ the measurement update 
criterion for the n-th trial and u^ := {ui,U2, ■ ■ ■ ,un} 
the measurement update rule. Note that ui is a map 
from to A^i and corresponds to the choice of the first 
measurement. 



3. Estimator 

An estimator p°^* = {pT*^, ■ ■ ■ iP'n^} is a set of maps 
from the data to the model space, p'^^ : 2?" — >■ O so that 
Pn\D"-) G O. The estimated density matrix p^^^D"-) 
is called the n-th estimate. We will often omit the data 
dependency. In this paper we use a maximum likelihood 

ML 



estimator p defined as 



2. Experimental design 
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argmaxp(Z?"|cr), 



(1) 



A probability distribution of outcomes in quantum 
measurement requires not only a density matrix, but 
also a positive operator valued measure (POVM), 11 = 
{na;}a;g;i', where X is the set of outcomes. When the 
measurement is characterized by a POVM 11 and the 
measured quantum state is characterized by a density 
matrix p, the probability distribution of the outcomes is 



where 



p(D"|o-) :=Tr[n 
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A quintuplet (O, TV, M' 



, p°^*) specifies an estima- 



tion scheme. A sketch of the procedure for a generic 
adaptive quantum estimation scheme is given in Fig. [TJ 
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FIG. 1. A sketch of a generic procedure for an adaptive quan- 
tum estimation scheme. 



4. Evaluation 

In order to evaluate the precision of estimates of the 
true density matrix, we introduce a loss function (some- 
times caUed a cost function) . A loss function A is a map 
from O X O to M such that (i) \fp,a e 0,A{p,a) > and 
(ii) Vp G O, A(/9, p) = 0. For example, the trace-distance 
and the infidelity (one minus the fidelity) are loss func- 
tions for density matrices. The outcomes of quantum 
measurements are random variables, and the value of the 
loss function between an estimate and the true density 
matrix is also a random variable. Thus, in order to eval- 
uate the precision of the estimator (not the estimate) for 
the true density matrix, we use the statistical expectation 
value of the loss function, called an expected loss (some- 
times called a risk function) [15|. The explicit form is 
given by 

A^(«^,p-'|p):= Y. p(i?^|p)A(p5^*(i^^),p).(3) 

The value of the expected loss depends on the choice of 
the estimator as well as the true density matrix. The 
latter is of course unknown in an experiment, and there 
are at least two approaches to eliminate its dependence, 
namely the average and the maximal (or worst case) ex- 
pected loss, given explicitly by 

Ar(«'^,P^^'):= fdp{p)AN{u'',p^^'\p), (4) 



Ar^(u^,p-*) := maxA^(«^,p-'|p). (5) 

pGO 

where /i is a probability measure on O. The task in this 
paper is to find a combination of a measurement update 
rule u^ and estimator p'^^^ with average expected loss as 
small as possible. 



parameter estimation [l6|, [l7| . In this subsection we in- 
troduce a few basic results of the asymptotic theory. First 
let us parametrize the state space S{H) . Any density ma- 
trix on d-dimensional Hilbert space can be parametrized 
by d^ — 1 real numbers, s e R'' ~^, i.e. p = p{s). In 
the d = 2 case, we take p{s) = ^(1 + s • cr), where 
(T — ((Ti, (72, fTs), ctq {a — 1,2,3) are the Pauli matri- 
ces, and s G K'^, ||s|| < 1, is called the Bloch vector. 
The estimation of p is equivalent to the estimation of 
s, and we let s°^^ denote the estimator. Estimates of 
a density matrix and of a Bloch vector are related as 
p-'(i?")=p(Ct(i?")). 

For any estimator s°''*, any number of measurement 
trials A^, and any positive semidefinite matrix H{s), the 
inequality 

J2 p(Z?^|s)[<(i^^) - sfHis)[s'i^\D^) - s] 

> tT[H{s)GN{u^, s'''\sfFN{u^, s)"iGjv(u^, s^'*, s)] 

(6) 

holds, where 

piD^\s):=piD^\pis)), (7) 

G^(^^,s-*,5):=V. Yl PP'^I^X^P'^), (8) 
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Fn{u'\s):= 



Vsp{D^\s)\7ip{D^\s 



D^eV 



p{D 



N\ 



■,(9) 



and tr denotes the trace operation with respect to the 
parameter space. Eq.® is a known generalization of the 
Cramer- Rao inequality [18], and we give a simple proof 
in Appendix [Bj Fn{s) is a (d^ - 1) x (d^ — 1) positive 
semidefinite matrix called the Fisher matrix of the prob- 
ability distribution {p{D^ \s)} ^n ^-pN . 

If the estimate converges to the true parameter, i.e., 
s'j^^D^) -^ s as N ^ 00 with probability 1, the LHS of 
Eq. dH) converges to and therefore the RHS should con- 
verge to 0. In this case, if we assume the exchangeability 
of the limit and derivative, the matrix G n {u'^ , s'^^^ , s) 
converges to the identity matrix /, and the quantity 
K]y{u^ , s) defined as 

KNiu^,s) ■.^tT[His)FNiu^,s)-'] (10) 

converges to 0. This Kn{u'^,s) can be interpreted as 
a lower bound of the weighted (by H{s)) mean squared 
error when N is sufficiently large. It is known that un- 
der certain regularity conditions, a maximum likelihood 
estimator achieves the equality of Eq. ^ asymptotically. 
For a given s, it would be wise to choose a measure- 
ment update rule which makes the value of Kn{u'^,s) 
as small as possible. This is the guiding principle of the 
A-optimality criterion. 



B. A generalized Cramer- Rao inequality 



C. A-optimality criteria 



The A-optimality criterion is a measurement update 
criterion based on the asymptotic theory of statistical 



We move on to the explanation of the procedure of 
A-optimality. The "A" stands for "average-variance" 



[17j . According to the asymptotic theory of statistical 
parameter estimation described in the previous subsec- 
tion, we wish to minimize the value of Kn{u^ , s). Sup- 
pose that we perform n trials and obtained the data se- 
quence D". We would like to choose the POVM min- 
imizing Kn+i{u^ , s) in A^n+i as the next, {n + l)-th, 
measurement. When we consider minimizing this func- 
tion, there are two problems. In order to avoid them, we 
introduce two approximations. The first problem is that 
the minimized function depends on the true parameter 
s. Of course the true parameter is unknown in parame- 
ter estimation problems, and we must use an estimate in 
the update criterion, s5f*(I?"), instead. The mesurement 
update estimator s®''' is not necessarily the same as s^^^ . 
The second problem is that unlike the independent and 
identically distributed (i.i.d.) measurement case, calcu- 
lation of the Fisher matrix in the adaptive case requires 
summing over an exponential amount of data, and is 
computationally intensive. To avoid this problem, we 
approximate the sum over all possible measurements by 
that over only those measurements that have been per- 
formed: 

F„+iK+i, s) « F„+i(w"+i, 8|i?") :- e:=i' ^(n., m) 



where 



FiU,,s) 



Y^ VsP(.xi;n,\8)v];p{xr,ni\s) .,„^ 



u,{D 



.i~l\ 



pi^Xi-.Tli \s) 

i = 1 , • • • , n 



(13) 



The matrix FiYli^s) is the Fisher matrix for the «-th 
measurement probability distribution {p{xi\ 'n.i\s)} xi^Xi, 
and F„+i(u"+^, s|_D") is the sum of the Fisher matrices 
from the first to the (n-l-l)-th trial. Instead of minimizing 
if„+i(M""'"^, s), we consider the minimization of 

i^„+iK+\s|Z?") := tr[i7(s)#„+iK+\s|i?")-ill4) 

It is known that the convergence oi Km{u^ : s\D^) to is 
part of a sufficient condition for the convergence of a max- 
imum likelihood estimator pjj], and this justifies the use 
of this second approximation. We explain the relation- 
ship between the conditional and unconditional Fisher 
matrices with respect to the estimator's convergence in 
Appendix [Cj After making these two approximations, we 
define the A-optimality criterion as 



n 



A-opt A-opt 



n+ 



'f\D'') 



= argmin tr[i/(sf )F„+iK+i, sf |i?")-i]. 



n„+ieA<„ 



(15) 



Finding YV^~°f is a nonlinear minimization problem with 
high computational cost in general. In this paper, we 
derive the analytic solution of Eq. (IT5|) in the 1-qubit 
case, reducing the computational cost significantly. 



D. Estimation setting 

We consider a 1-qubit mixed state estimation prob- 
lem, so that O = iS(C^). We identify the Bloch param- 
eter space {s G IR'^|||s|| < 1} with O, where we restrict 
the true state space to be strictly the interior in order 
to avoid the possible divergence of the Fisher matrix. 
Suppose that we can choose any rank-1 projective mea- 
surement in each trial. Let n(a) = {Iix{<i)}x=± denote 
the POVM corresponding to the projective measurement 
onto the a-axis (a G R'^, ||a|| = 1), whose elements can 
be represented as 



n±(a)-i(l±a-cr). 



(16) 



This is the Bloch parametrization of projective 
measurements. We identify the set of parameters 
A= {a e M'^l ||a|| — 1} with the measurement class AA — 
{All rank-1 projective measurements on a 1-qubit system}. 

For our loss functions, we use both the squared Hilbert- 
Schmidt distance A"^ and the infidelity A^^ [Ii|: 



AHS(s,s') 



iTr[(p(s)-p(s'))'] 



{s-s')\ 



(17) 
(18) 



AiF(s,s'): = l-Tr[VVpRp(s')ypW] (19) 

(20) 



= \{\-s-s' -^fY^MN^-WA?) 



We note that the Hilbert-Schmidt distance coincides with 
the trace distance in a 1-qubit system. The asymptotic 
behavior of the average expected fidelitvAWf^™ is known 
in the 1-qubit state estimation case [13[ |lj, [20[. The 
measure used for calculating this average is the Bures 
distribution, d^x{s) = ^(1 — ||s|p)^-^/^ds. If we limit 
our available measurements to be sequential and inde- 
pendent (i.e., nonadaptive), A]^^™ behaves at best as 
0(7V-3/4) [i|^ [2^_ On the other hand, if we are allowed 
to use adaptive, separable, or collective measurements, 
A]^'^™ can behave as 0{N-^) [H. In [H, [11 ill , the 
coefficient of the dominant term in the asymptotic limit 
is also derived. 

In Sec. IIIIB 11 we show numerical results. A maxi- 
mum likelihood estimator is used, and it is shown that 
the average expected infidelity of an A-optimal scheme 
behaves as 0(A^~^), illustrating that the A-optimality 
criterion is indeed making use of adaptation to outper- 
form nonadaptive schemes. 



E. Survey of some other update criteria 

We briefiy review some of the other adaptive measure- 
ment update criteria proposed in the literature, using 
our terminology and notation introduced in the previous 
subsections. 



1. Two-step adaptation criterion 

Before explaining update criteria that are performed 
at each and every trial, such as A-optimality, we briefly 
review a simpler update criterion. The two-step adapta- 
tion criterion requires the measurement update only once 
during a measurement sequence. We have 



hi+l 



{on 



Hist if n < TVist 

n2„d if ?^ > ^ist 



(21) 



Thus, for all trials up to and including trial A^ist a fixed 
POVM Hist is performed, and an estimate is calculated 
from the obtained data. Using that data we choose a 
new POVM n2nd for the remaining A^2nd(= N — Nist) 
copies. In [ij, [2l| - [23| . two-step adaptation criteria are 
used to prove mathematically an asymptotic bound for 
weighted mean squared errors in 1-qubit state estimation. 
In [2J, |25| , some numerical results are shown for a few 
two-step adaptation schemes. 



2. N88 criterion 

In [j-Q , an update criterion based on the Cramer- Rao 
inequality is proposed. The update criterion is given by 



M„+i(Z?") = argmin tr[i/(Jf )F(n, sf )- 



(22) 



The difference from the A-optimality criterion is that 
in Eq. (j22p the Fisher information matrix used in the 
update does not take into account all n + 1 measure- 
ments, but about only the {n -\- l)-th measurement. The 
advantage of course is that this reduces the computa- 
tional cost of updates. The disadvantage is that when 
■M-n (n = 1, 2, . . .) consists of informationally incomplete 
POVMs, as is the case in most experiments, the estimates 
cannot converge to the true state. As explained in Sec. 
IIIDl in this paper A^„ is restricted to rank-1 projective 
measurements, and in this setting Eq. (|22p does not work 
well. 



3. FKFOO criteria 

In Q, two update criteria are proposed. 

(i) The first criterion is based on the Shannon entropy 
of the estimated measurement probability distribu- 
tion, and is given by 

Un+i(-D") — argmax 

- Y. p(^;n|pr(i?"))inp(x;n|pf(D"))). 



x(^X„ + i 



(ii) The second criterion uses a third state estimator 
/3°''' such that 

(u„+i(i?"),^f(D"))= argmax 

(n,cr)eA^„+ixO 



x^X„+i 



(24) 



Numerical simulation is performed for the case where O 
is the set of 1-qubit pure states and A^„ is the set of 
projective measurements, while p"*'* is a biased maximum 
likelihood estimator, /5°^* is a Bayesian estimator up to 
A^ = 60. Average (not expected) infidelity is used as the 
evaluation function. 



^. HF08 criterion 



In [8|, an update criterion given by 
w„-)-i(Z?") = argmax 

[j dp Y p{x;U\p)A{p';:l,{D'^+'),p)), (25) 

•^<^ xGX„ + i 

is proposed. A numerical simulation is performed in [8|, 
where the setting is that O is the set of 1-qubit pure 
states, A^ is a set of parity measurements using an an- 
cilla system, and p"*'* and p°^^ are maximum likelihood 
estimators. The behavior of the average expected fidelity 
is numerically analyzed up to A'' = 20. 



5. HFll criterion 
An update criterion proposed in [9] is given by 



M„+i(£'") — argmax 

n 

(- Y. 5lTr[n,^,.n,]inTr[n,,,,n,] 

X^Xn^l i — 1 

and the estimator is defined as 

pT{D^) = argmax Tr[pp(D")], 



p£0 

-p{D-) 



1 " 

-En. 



(26) 

(27) 
(28) 



i=i 



(23) 



In the numerical simulations, the estimation setting is 
such that O is the set of pure states on d-dimensional 
Hilbert space H, and Mn is the set of projective mea- 
surements on %. Numerical simulations of average ex- 
pected fidelity are shown for d = 2, 4, 6, 8, and 13, all up 
to A^ = 50. 



6. FFOO criterion 



In [lOj , an update criterion based on Bayesian estima- 
tion and Shannon entropy is proposed. Let P{p) denote 
a prior distribution on O. The update criterion is 



M„+i(I?") = argmax 






U.PiPlD 



Tl+l^ 



PiplD-- 



(29) 



argmax 



(- f dpPip\D-)lnP{p\D"y 

^ Jpeo 



V p'''"'ix; nlD") / dpP{p\D"+^) \nP{p\D''+^ 
Jo 



xex„ 



(30) 



where 



p-°(x;n|i^") := / dpP{p\D^^)p{x;U\p), (31) 

Jo 



P{p\D-) 



P{p)p{D^\p) 
J^daP{a)p{D^^\a)- 



(32) 



of separable measurements up to A^ = 10^. The evalua- 
tion function is the average expected infidelity, and it is 
shown that their scheme is more precise than standard 
quantum tomography. In Sec. IIIIB 11 we point out that 
our numerical results for 1-qubit show that A-optimality 
gives even more precise estimates than those given by Eq. 
|), at least from A^ == 100 to 1000. 



III. RESULTS AND ANALYSIS 



In [l^, the case in which O is the set of 1-qubit mixed 
states and Ai is the set of projective measurements is nu- 
merically analyzed up to iV = 50. The evaluation func- 
tion used is the average (not expected) infidelity. 



7. HHll criterion 
In |ll| . an update criterion given by 
Un+i{D'') = argmax (-V p"™(a;; n|i?") lnp"™(a;; n|D") 

+ f dpPip\D'') V p{x;U\p)lnp{x;U\p) 
Jo „^^ 



As explained in Sec. IIIDl we consider the A-optimality 
criterion for 1-qubit state estimation using projective 
measurements. In Sec. IIII Al we give the analytic so- 
lution, and in Sec. IIIIB I we show the results of numerical 
simulations. 



A. Analytic solution for A-optimality in 1-qubit 
state estimation 

First, we give the explicit form of the Fisher matrix for 
projective measurements. The probability distribution 
for the rank-1 projective measurement 11(a) is given by 

p(±;a|s) = i(l±s.a), (34) 

and the Fisher matrix is 

VsPi+;a\s)V^p{+;a\s) 



F{a,s)=- 



p{+;a\s) 
V sp{-; a\s)V'^p{-; a\s) 
p{-;a\s) 



l-{a-sy 



x(^X„+i 



(33) 



is proposed, where Eqs. (PT|) and ([5^ have been used. 
From a simple calculation, we can see that the criteria 
defined in Eq. (150]) and in Eq. ([55)) are equivalent. This 
criterion involves an integration which requires high com- 
putational cost. In [11], a special technique for calculat- 
ing the integral, called a sequential importance sampling 
method, is used in order to reduce that computational 
cost. The authors performed numerical simulation for 
the case in which O is the set of 1-qubit mixed states 
and M.n are projective measurements up to A^ = 10''. 
They also considered the case in which O is the set of 
2-qubit states and M are a set of mutually unbiased 
bases, a set of pairwise Pauli measurements, and a set 



(35) 



(36) 



In this case, Eq. (|T5l) is rewritten in the Bloch vector 
representation as 

at:f' := argmintr [i/(if ){F„(a", sf |i?") -|- F{a, Jf )}- 

aeA 

(37) 

We present the analytic solution of Eq. ([57)) in the form 
of the following theorem. 

Theorem 1 Given a sequence of data I?" = 
{(ai, xi), . . . , (a„, x„)}, the n-th estimate s'^^ , and 
a real positive matrix H, the A-optimal POVM Bloch 
vector is given by 



A 



-opt -^n^minv^nj 



(38) 



\\Bn£Tnin[(^nJ\\ 

where 

Bn = V^F„(a»,5-t|i?")H(s-t)-iF„(a",s-t|i?"), 

(39) 
C„ = B„(/-sfsf ^ + F„(a",sf |Z?")-i)B„, (40) 



emin(C'„) is the eigenvector of the matrix Cn correspond- 
ing to the minimal eigenvalue, and I is the identity in 
the parameter space. 

We give the proof of Theorem [1] in Appendix [X] 

In Eq. (l40l) . the inverse of the matrix Fn appears. 
In the proof of Theorem [1] the invertibihty of Fn is as- 
sumed. The invertibihty of Fn is equivalent to the con- 
dition that a^ is a basis of R"^. When we choose the 
second and third measurements, Fi and F2 are not in- 
vertible. Thus the update scheme does not apply to these 
steps, and the choices are arbitrary. One simple choice is 
to perform cti-, (72-, and crs-projective measurements at 
the first, second and third trials respectively, and this can 
be shown to satisfy Theorem [1] as follows. The choice of 
the first measurement is always arbitrary, and we choose 
ai — (1,0,0)^, a CTi-projective measurement. Then for 
any true Bloch vector s the rank of Fi is 1, and if we 
interpret the inverse matrix in Eq. (j40|) as a generalized 
inverse matrix, Ci is a rank 1 matrix with minimal eigen- 
values 0. The supports of Fi, Bi, and Ci are the span of 
{ai}. Therefore i3iemin(Ci) is an arbitrary vector in the 
2-dimensional space spanned by (0, 1, 0)"^ and (0, 0, 1)"^, 
and we choose 02 = (0, 1, 0)"^. Then using the same logic, 
the third measurement is fixed to 03 — (0,0, 1)"^. 

From the explicit formulae of the squared Hilbert- 
Schmidt distance and infidelity in Eqs. (fT8|) and (f20| . 
we have 



1 



A^^{s,s') = {s'-sf^I{s'-s) 



A'^is,s') = {s'-sfUl+-^^^]{s'-s) 



ss 



4 
.1 

iV ^ l-||sll2 

+o{\\s'-sr). 



(41) 



(42) 



Therefore when we use the Hilbert-Schmidt distance 
as our loss function, we substitute H^^{s) := jl and 
ijHS(^)-i ^ 4 J jj^^Q Eqs.dSHl), dSi), and (gni) to obtain 



B„^Fnia"JT\Dn, (43) 

Cn = Fn{a^, sTlD^il ~ sTsT^)Fn{a", sT\D") 
+ Fn{a",sT\D"), (44) 

and we do not need to explicitly calculate the inverse 
or square root matrices for A-optimality On the other 
hand, when our loss function is the infidelity, we must use 
ijiF(s) ■- Ul+^f^) andiJiF(s)-i^4(/_ss'^). 



B. Numerical simulation 

We performed Monte Carlo simulations of the following 
four experimental designs described in detail below; A- 
optimal adaptive scheme for the squared Hilbert-Schmidt 
distance, the same for infidelity, XYZ repetition, and uni- 
formly random selection. 

A-optimality for the squared Hilbert-Schmidt distance 
is the adaptive scheme defined by Eq. ([571) with H = H^^. 



Similarly, A-optimality for the infidelity is that with 
H = H^^ . As explained in the previous subsection, the 
choice of measurement Bloch vectors at the first and sec- 
ond trials is arbitrary; we choose ai = (1,0,0)"^ and 
12 = (0, 1, 0)-'", i.e., at the first trial we perform the pro- 
jective measurement of cti, and that of cr2 at the second 
— the third trial is automatically the projective mea- 
surement of 0-3, corresponding to 03 = (0,0,1)-^. The 
XYZ repetition scheme is nonadaptive, in which we re- 
peat the measurements of ai, a2, and a^ , corresponding 
to standard quantum state tomography. Uniformly ran- 
dom selection is also nonadaptive, where at each trial we 
choose the next measurement direction randomly on the 
Bloch surface, according to the S0(3) Haar measure. For 
consistency with the other three schemes, we fix the first, 
second and third measurements to be the projective mea- 
surements of cTi, (72, (73, respectively, and randomly select 
directions from the fourth trial on. 

We choose a maximum likelihood estimator in all four 
experimental designs. It is known that the estimators 
minimizing A^^^™ and A^^^'^'' are Bayesian estimators 
[13; 26], but the integrations necessary for Bayesian es- 
timation take too much computation time. For the two 
A-optimality criteria, we choose both the real and the 
dummy estimators to be maximum likelihood, s'^'^' = 
jest _ gML ^ -^g used a Newton-Raphson method to solve 
the (log-)likelihood equation and the completely mixed 
state s = as the initial point of the iterative method. 
When a search point came out of the Bloch sphere during 
the procedure, we chose the previous point (included in 
the sphere) as the estimate. 

In the following subsections, we show the plots for 
two loss functions; the squared Hilbert-Schmidt distance 
A^^ and infidelity A^^. The average expected losses 
A^** are shown in Sec. IIIIB 11 and pointwise expected 
losses An are shown in Sec. IIIIB 21 In the both subsec- 
tions, the line styles are fixed as follows: solid (black) 
line for A-optimality for the squared Hilbert-Schmidt 
distance (AHS), dashed (red) line for A-optimality for 
the infidelity (AIF), chain (blue) line for XYZ repetition 
(XYZ), Dotted (green) line for Uniformly random selec- 
tion (URS). 



1. Average expected losses 

We analyse the average behaviour of the estimation 
errors over the Bloch sphere. The integration for averag- 
ing is approximated by a Monte Carlo routine, and the 
statistical expectation is approximated by an arithmetric 
mean using pseudo-random numbers. 

Figure [2 shows the average expected loss functions 
A^'' against the number of trials N (the horizontal and 
vertical axes are both logarithmic scale): (HS-Bures) 
AHSavc integrated via the Bures distribution ^Burcs, (HS- 
Euclid) A]^^'*™ integrated via the Euclidean distribu- 
tion //Euclid (s) = 3/47r, (IF-Bures) A]^^™ integrated 
via /iBurcs, and (IF-Euchd) A]^^™ integrated via /iEuciid- 
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FIG. 2. Average expected loss A^°(tt^,, 

via the Bures distribution /XBures, (HS-Euclid) AJ^^^™ integrated via the Euclidean distribution piEuciid(s) = 3/47r, (IF-Bures) 
^iFavo jjj^ggj-ated via fiBuree, aud (IF-Euclid) A5^^^° integrated via pEuciid- The dashed spaced (orange) line in (IF-Bures) is the 
bound of separable (including adaptive) schemes derived in [ij. The number of measurement trials A^max is 1000, the number 
of sequences used for the calculation of the statistical expectation values A^mcan is 1000, and the number of sample points used 
for the Monte Carlo integration A^'mc is 3200. 



Fig. [2] (HS-Bures) and (HS-Euclid) shows that the esti- 
mation errors of the four experimental designs are almost 
equivalent from the viewpoint of the squared Hilbert- 
Schmidt distance. As depicted in (HS-Bures), the estima- 
tion errors of the two A-optimality schemes are slightly 
larger than the other nonadaptive schemes; as we show 
in the next subsection (pointwise analysis), this gap de- 
creases as N becomes larger. On the other hand. Fig. [5] 
(IF-Bures) and (IF-Euclid) show the explicit gap between 
the adaptive and nonadaptive schemes. The gradients of 
the curves begin to differentiate from around N — 100, 
and as depicted in (IF-Bures) , the gradients of XYZ and 
URS are almost —3/4 around N = 1000. This means 
that the average expected infidelity behaves as 0{N~'^/^) 
and is consistent with the result of the asymptotic anal- 
ysis presented in [1J|. On the other hand, the gradients 
of AHS and AIF are greater than the nonadaptive limit 



—3/4, indicating that AHS and AIF make good use of 
adaptive resources. Around N = 1000 the gradient of 
AIF is almost —1, which is the bound for adaptive ex- 
perimental designs |14| . 



Let us compare the estimation errors of A-optimality 
and the HHll criteria explained in Sec. HIE 61 From 
Fig. [2] (IF-Bures) , the average expected infidelity of AHS 
and AIF are 4.2 x lO'^ and 3.5 x lO'^ at A^ = 1000. 
On the other hand, the corresponding amount for the 
HHll criterion can be estimated roughly from Fig. 2 (a) 
in [13 to be 7.0 x 10~^. This implies that for 1-qubit 
state estimation, the average expected infidelity of the 
A-optimality criterion is about two-times smaller than 
that of Eq. (1331), at least around N = 1000. 
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FIG. 3. Pointwise expected loss Ajv(u'^, s'^^^ls) against the number of trials A'^ (the horizontal and vertical axes are both 
logarithmic scale): (HS-Pl), (HS-P2), and (HS-P3) are the expected squared Hilbert-Schmidt distances for s given by (r, 6, <j)) = 
(0,0,0), (0.99,0,0), (0.99, 7r/4,7r/4), and (IF-Pl), (IF-P2), and (IF-P3) are the expected infidelities for the same three true 
states, respectively. The number of measurement trials A'max is 10000, and the number of sequences used for the calculation of 
statistical expectation values A^'mcan is 1000. 



2. Pointwise expected losses 

Next, we analyse the behaviour of the estimation er- 
rors at several true Bloch vectors s. Figure [3] shows 
the pointwise expected loss functions Ajv(w^|s) against 



the number of trials N (the horizontal and verti- 
cal axes are both logarithmic scale): (HS-Pl), (HS- 
P2), and (HS-P3) are plots of the expected squared 
Hilbert-Schmidt distances for s given by (r, 9, </>) = 
(0,0,0), (0.99,0,0), (0.99,7r/4,7r/4), and (IF-Pl), (IF- 
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P2), and (IF-P3) are the expected infidelities for the same 
three true states, respectively. 

As depicted in (HS-Pl) and (IF-Pl), the estimation 
errors of all four schemes are almost equivalent for the 
completely mixed state, s = 0. As the Bloch radius r 
becomes larger, the differences between the four schemes 
become clearer. Figure |3] (HS-P2) and (HS-P3) are the 
plots of the expected squared Hilbert-Schmidt distances 
at a high purity point, r — 0.99. In the region of A^ = 10 
to around 7000, the squared Hilbert-Schmidt error of the 
two adaptive schemes is larger than that of the two non- 
adaptive schemes. In particular, the error of AHS is 
larger that that of AIF; this might seem strange, but 
in the region of A^ > 7000, the error of AHS becomes 
smaller than that of AIF, indeed it eventually becomes 
the smallest of the four schemes. We believe that there 
are two reasons for A-optimality's large error for small N . 
First, the A-optimality criterion is based on an asymp- 
totic theory of statistical estimation. When the number 
of measurement trials A^ is small, the Cramer- Rao bound 
is not necessary suitable for characterizing estimation er- 
rors. Second, it uses a dummy estimator in the measure- 
ment update. When A^ is small, s^^* is not a good esti- 
mate, and thus the choice of the next measurements can 
be unreliable. Of course, when A^ becomes sufficiently 
large, both of these problems are alleviated. 

The gap between the estimation errors of adaptive 
and nonadaptive schemes becomes smaller as A^ becomes 
larger in (HS-P2) and (HS-P3), while it grows in (IF-P2) 
and (IF-P3). Only the XYZ scheme changes dramatically 
between (IF-P2) and (IF-P3); the other three schemes do 
not because AHS, AIF, and URS are invariant under ro- 
tation of the true Bloch vector (for very small A^, there 
are differences, and these are because the first three mea- 
surements are fixed to ai, (T2, (T3 -projective measurements 
and not rotationally invariant). Figure [3] (IF-P2) is the 
case in which the directions of the measurement and the 
true Bloch vector are matched (to (0,0, 1)). In this case, 
XYZ is the best scheme, exhibiting the smallest estima- 
tion error. Around A^ = 10000, the estimation error of 
AIF becomes as small as that of XYZ. That of AHS is 
smaller than URS, but larger than the other two schemes. 
We believe that this is because the selected Hessian ma- 
trix H^^ used in the update routine is unsuitable for the 
loss function A^^" in (IF-P2) (and (IF-P3)). Figure[3](IF- 
P3) is the case in which the directions of the measurement 
and the true Bloch vector are the most discrepant (for a 
fixed purity). In this case, the estimation errors of XYZ 
and URS are almost the same and behave as 0{N~^^'^), 
and those of the adaptive schemes are smaller than those 
of the nonadaptive ones, (this behavior of expected in- 
fidelity for i.i.d. measurements is discussed in [2, llJ], 
and a detailed analysis will appear in ^27]). When we 
consider the whole Bloch sphere, of course the cases in 
which the direction of XYZ measurements and the Bloch 
vector are matched are few, and therefore the average 
expected infidelities of AHS and AIF are smaller than 
those of XYZ and URS. This also indicates that the adap- 



tive schemes have better worst-case performance (lower 



A max 



Eq. ([U) than the nonadaptive schemes. 



3. Purity dependence 
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FIG. 4. Purity dependence of average expected infidelity at 
iV = 1000. Cross (black) for AHS, saltire (red) for AIF, aster- 
isk (blue) for XYZ, and square (green) for URS. The number 
of sequences used for the calculation of the statistical expec- 
tation values A'^mean IS 1000, and the number of sample points 
used for the Monte Carlo integration A^mc is 500 for each 
Bloch radius r. 

Figure U shows the purity dependence of the average 
expected infidelity at A^ = 1000. The average is taken 
over all directions 9 and (j> for each Bloch radius r. It 
indicates that the average expected infidelities of the two 
adaptive schemes are smaller than those of the two non- 
adaptive schemes. The appearance of peaks for XYZ and 
URS is discussed in Appendix [Dl 



4- Measurement sequences 

Figure O is a plot of the measurement Bloch vectors at 
A^ = 100 (left column), 1000 (middle column), and 10000 
(right column) for 900 runs. The true state is (r, 9, </>) — 
(0.99,7r/4, 7r/4), and the upper three subplots are AHS 
while the lower three are AIF. Figure [S] shows that the 
measurement Bloch vectors are clustered around the true 
state, with some interesting behaviour at A^ = 10000. In 
(AHS-10000), the measurement directions are clustered 
very narrowly at the true state and also around the great 
circle that it defines. In (AIF-10000), on the other hand, 
the directions are clustered widely around the true state. 
This is due to the difference between the loss functions 
employed in the update routine, namely squared Hilbert- 
Schmidt distance in the former and infidelity in the latter. 
We mention that for a completely mixed true state, the 
measurement Bloch vectors are distributed randomly on 
the Bloch sphere for large A^. 
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FIG. 5. Distribution of measurement Bloch vectors at A'' = 100 (left column), 1000 (middle column), and 10000 (right column) 
for 900 runs; The true state is (r, 6, (f)) = (0.99, 7r/4, 7r/4). The upper three plots (AHS-100), (AHS-1000), and (AHS-10000) are 
AHS while the lower three (AIF-100), (AIF-1000), and (AIF-10000) are AIF. 



IV. DISCUSSION 



B. Generalization to higher dimensional systems 



A. Implementation 



HWP QWP 



PBS 




+ /- 



Measurement update 



^ S 

Estimation 



est 
n 



FIG. 6. An implementation of adaptive projective measure- 
ments for single photon polarization qubits in quantum op- 
tics. HWP and QWP are half and quarter wave plates, PBS 
is a polarization beam splitter, PD are photodetectors, and 
CC denotes classical computation. The direction of the pro- 
jective measurement are adapted by changing the waveplate 
angles. 



In order to compare the performance of the A- 
optimality criterion to the other update schemes, we 
have considered 1-qubit states as the estimation objec- 
tive. Current and future quantum information process- 
ing is concerned with higher dimensional estimation ob- 
jectives, not only states but also processes. In 1-qubit 
state estimation, we can reduce the computational cost 
for A-optimality by using the analytic solution of Theo- 
rem [l] but as we see in Appendix |X1 the techniques used 
to derive that solution depend on the properties of 1- 
qubit states and projective measurements. A-optimality 
in higher dimensional systems will need a new solution, or 
must deal with the increasing complexity of the nonlinear 
minimization problem. One possible approach is to place 
constraints on the measurement class A^„. Instead of 
considering a continuous set of measurement candidates, 
we could consider a discrete set. One expects that the 
resulting discrete minimization problem would be much 
simpler. If the number of discrete measurement candi- 
dates is too small however, the estimation error could 
be worse than standard quantum tomography. The re- 
lation between the reduction in computational cost and 
the (probable) increase in estimation error by introducing 
such discrete minimization is an open problem. 



There are two main issues when considering the prac- 
tical implementation of an adaptive scheme, namely the 
ease with which measurement updates can be made in 
the apparatus, and the time required to compute those 
updates. In quantum optics, projective measurements 
and single qubit rotations are standard tools in quantum 
information processing experiments. Figure IHl illustrates 
a simple implementation example for a one photon po- 
larization system. In this regard, the first issue is not 
a problem — in general, of course it will depend on the 
experimental state of the art. 



V. SUMMARY 

In this paper, we considered adaptive experimental de- 
sign and applied a measurement update method known in 
statistics as the A-optimality criterion to 1-qubit mixed 
state estimation using arbitrary rank-1 projective mea- 
surements. We derived an analytic solution of the A- 
optimality update procedure in this case, reducing the 
complexity of measurement updates considerably. Our 
analytic solution is applicable to any case in which the 
loss function can be approximated by a quadratic func- 
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tion to least order. We performed Monte Carfo simu- 
lation of this and several nonadaptive schemes in order 
to compare the behaviour of estimation errors for a fi- 
nite number of measurement trials. We compared the 
average and pointwise expected squared Hilbert-Schmidt 
distance and infidelity of the following four measurement 
update criteria: A-optimality for the squared Hilbert- 
Schmidt distance (AHS), A-optimality for the infidelity 
(AIF), repetition of three orthogonal projective measure- 
ments (XYZ), and uniformly random selection of projec- 
tive measurements (URS). The numerical results showed 
that AHS and AIF give more precise estimates than URS 
and XYZ which corresponds to standard quantum to- 
mography with respect to expected infidelity. 
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The first term of the RHS in Ea. (jA3|) is independent of 
a and therefore we obtain 



argmintr[iJ(s){V^ + F{a, s)}'^] 

a'^V-^H{s)V-^a 

— argmax ^,^ ^^ ^^ ,, 

aeA a^(/-ss^ + y-i)a 

. a^(I - ss^ + V-^)a 
= argmm j, , , - , , 



(A4) 

(A5) 



where we used the relation 1 — a^ la. Let us introduce 
a vector 



b = 






(A6) 



Note that b and a take values in the same set, so that 
the vector a can be represented in terms of b as 



a = 



y/VH{s)-Wb 

\WVH{s)-Wb\\ 



(A7) 



Then the minimization function is represented by using 
b as 

aTV~^H{s)V-^a 



b^^VH{syW{I - ss^ + V-^)^VH{s)-Wb. (A8) 



Appendix A: Proof of Theorem [T] 

We give the proof of Theorem [TJ First, we introduce a 
lemma about matrix inverses. 



Lemma 1 \2a] Let V denote a k x k invertible matrix. 
Let us consider a matrix W = V + vv , where v is a 
k-dimensional vector. If W is not singular, then 



w-^ ^v- 



l + v'^V-^v' 



(Al) 



By substituting v = a/y/l — {a ■ s) into Eq. ()Aip (in 
our case fc = 3 and V =^ Fn), we obtain 



{V + F{a,s)}-^ = V- 



V-^aa'^V-' 



l-{a-s)^ + aTV-^a 

(A2) 



and 



tiiH{s){V + F{a,s)}-^]^ 



a^V-^H{s)V-^a 



^^■[^(^)^ ] - l-ia-syVaTV-W (^^) 



The vector b minimizing Eq. (jA8l) is the eigenvector with 
the minimal eigenvalue of the matirx 



C := yJVH{s)-W{I - ss^ + V-^)yJVH{s)-W, 



(A9) 



i.e., b = einin(C). By substituting V — F^ and s = s° 
into Eqs. (|X71) and (j^^ . we obtain Theorem [TJ ■ 



Appendix B: Proof of a generalized Cramer-Rao 
inequality 

We give a proof of Eq. ^ . We consider a general prob- 
ability distribution {p{y\9)}y^zy^ where 3^ is a set of out- 
comes and € Q C M.*' . It does not necessarily obey 
Horn's rule. We assume differentiability of a sufficient 
order with respect to 9. From the definition of probabil- 
ity distributions, we obtain 



i = Ep(y|^)' 



(Bl) 



yey 



= Y. ^sp{y\e) = Y. Piv\^)^s lnp(y|0), (B2) 
vey yey 

where we assumed that V0 and Vy G y,p{y\0) > 0. This 
assumption is valid for all full rank density matrices in 
any finite dimensional system. The contrapositive is that 
there can exist non full rank density matrices which do 
not satisfy the assumption. This is the reason why we 



13 



restrict our estimation objective to mixed states, the in- 
terior of the Bloch sphere in Sec. IIIDI Let us define a 
k X k matrix G as 



G: = V,^p(y|0)r^*(y)^ 



(B3) 



yey 



J2 Pivime \npiy\9)ie'^'\y) - Of, (B4) 



where we used Ea. (|B2[) . For any vectors u and w in M'^, 
we obtain 

[u'Gwf 

yey 
<(j2p{y\e)[u^Ve\npiy\0) 



T„,,l2 



yey 

x[Y.piy'\9W^'iy)~9fw 
y'ey 

{u^Fu){'uFEw), 



(B6) 



where 



E-.^Y. piy\(^Wiy) - (^W^\y) - sf^ (bt) 



yey 



F:^J2 P(y\^)"^s l^p{y\0)Vj lnp{y\0). (B8) 
yey 

Therefore \/u,w, we obtain 



rp u Gww G u 

w Ew > 7f— . 

u^ t u 



(B9) 



We would Uke to obtain an inequahty as tight as possi- 
ble, so let us consider the maximization of the RHS of 
Eq. (|B9p . It is maximized when u oc F~^Gw, and the 
maximal value is vo^ G^ F^^Gw. We obtain a matrix in- 
equality 

E > G^F-^G. (BIO) 

Multiplying by a positive semidefinite matrix H and tak- 
ing the trace of Ea. (jB10| ). we obtain 

tT:[HE]>tiiHG^F-^G]. (Bll) 

By substituting 3^ = D^, 6* = s, and 6*°"* = s°j^\ we 
obtain Eq. ([6]). When the estimator 6'®''* is unbiased, 
i-^-' J2yi£yPiy\(^)(^°^^{y) — ^1 the matrix G is the iden- 
tity matrix, and we obtain the (standard) Cramer-Rao 
inequality: 

E>F-^. (B12) 



Appendix C: Conditional Fisher matrices 



In this section we explain the relation between condi- 
tional and unconditional Fisher matrices. From a simple 
calculation, we can obtain 

Fn{u^,s)= Y, p{D''-^\s)FN{U'',s\D''-')iGl) 



where the sum is taken over D^~^ G T>^~^. This is 
the reason why Fjv is called the conditional Fisher ma- 
trix of Fn- In statistical parameter estimation theory, 
it is known that the divergence of the conditional Fisher 
matrix {F^q — >■ cx) as A^ — > oo) almost everywhere in 
T)^ is part of a sufficient condition for the convergence 
(known as strong consistency in statistics) of a maximum 
likelihood estimator JJ>] . If we assume that the other el- 
ements of the set of sufficient conditions are satisfied, 
the divergence of the conditional Fisher matrix is suffi- 
cient for the convergence of a MLE. In this case, from 
Ea. (|Cll) . the unconditional Fisher matrix also diverges 
(Fjv — s> oo), and this is equivalent to the condition that 
tr[F^^] -^ 0. Therefore, the divergence of the uncondi- 
tional Fisher matrix is a necessary condition for the con- 
vergence of a MLE. The divergence of Fjv is, however, 
not sufficient for the convergence of a MLE. 

We illustrate this with a simple example. Suppose 
that our estimation objective is O = S{€?). At the 
first trial, we perform a POVM 11 = {nT,nF}, where 
IIt = lip = \l- We obtain the outcome T and F both 
with 1/2 probability. When we obtain an outcome T at 
the first measurement, we perform standard quantum to- 
mography for the rest of all the trials. In this case, a MLE 
converges to the true state, and the conditional Fisher 
matrix Fn{u^\D^) whose D^ includes xi —T diverges. 
Let Fisi{u^\T) denote the conditional Fisher matrix. On 
the other hand, when we obtain F in the first measure- 
ment, we repeat the same POVM 11 for the remaining 
trials. Let F]s[{u^\¥) denote the conditional Fisher ma- 
trix whose D includes xi = F. In this case, no estimator 
converges to the true state because the POVM 11 does 
not give us any information, (the probability distribution 
is (1/2, 1/2), independent of the true state). Then we ob- 
tain F/v(u^|F) = 0. The unconditional Fisher matrix is 
calculated as 



FNiu'', s) = i#Ar(«^|T) + ^pNiu^lF) (C2) 



— > oo. 



(C3) 
(C4) 



i.e., the unconditional Fisher matrix F]\f diverges even 
though no estimator converges to the true state with 
probability 1/2. Therefore the divergence of F/v is nec- 
essary, but not sufficient for the convergence of a MLE. 

As we can see from the above example, in adaptive 
experimental designs, the essential characteristic of the 
scheme is not the unconditional Fisher matrix but the 
conditional Fisher matrices. In order to make a MLE 
converge, we need to design an experiment such that 
almost all (not necessarily strictly all) the conditional 
Fisher matrices diverge. From this point of view, the 
approximation Eq. (|lip lies at the heart of adaptive ex- 
perimental designs. 



14 



.rx 




(XYZ) ; 


■ 




; 




Nj. ■■*^>- 





^r . \. 


(URS) ; 













"^^ 



10 100 

Number of trials, N 



10 100 

Number of trials, N 



FIG. 7. Purity dependence of average expected infidelity of XYZ and URS schemes: The average is taken over all directions 9 
and (f) for each Bloch radius r. Average expected infidelity of XYZ repetition (left) and URS (right) for different Bloch radii. 
Solid line (black): r = 0, dotted line (green): r = 0.7, dotted spaced line (blue): r = 0.9, dashed line (light blue): r = 0.93, 
dashed spaced line (purple): r = 0.97, and dotted dashed line (red): r = 0.99. The number of sequences used for the calculation 
of the statistical expectation values A'^moan is 1000, and the number of sample points used for the Monte Carlo integration Nmc 
is 500 for each Bloch radius r. 



Appendix D: Purity dependence of XYZ and URS 

schemes 



In Fig. |4]it is shown that the average expected infideli- 
ties of XYZ and URS at A^ = 1000 have a peak around 
r — 0.97. Here we explain the origin of the peak. Fig. [7] 
is a plot of average expected infidelity for six Bloch radii 
(purities) r. We choose six purities from the fourteen pu- 
rities in Fig. |4]to make things easier to see. The average 



is taken over all directions 9 and (j) for each Bloch radius 
r ; (XYZ) is for XYZ and (URS) is for URS. Roughly 
speaking, the plots can be interpreted as straight lines 
with different slopes and j/-intercepts on a log-log scale. 
As the purity (r) increases, two things occur: (i) the slope 
of the curves becomes less steep, and (ii) the y-intercept 
decreases. At A'' = 1000, these two effects combine in 
such a way as to create a peak in the estimation error 
around r = 0.97. 
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